Acta Orthopaedica 2013; 84 (2): 207-212 



Reliable classification of children's fractures according to 
the comprehensive classification of long bone fractures by 
Muller 



207 



Terje Meling 1 , Knut Harboe 1 , Cathrine H Enoksen 1 , Morten Aarflot 2 , Astvaldur J Arthursson 1 , 
and Kjetil Soreide 3 ' 4 



1 Department of Orthopedic Surgery, 2 Norwegian Centre for Movement Disorders, and department of Surgery, Stavanger University Hospital, Stavanger; 
department of Surgical Sciences, University of Bergen, Bergen, Norway 
Correspondence: Terje Meling meteee@sus.no 
Submitted 12-06-14. Accepted 12-09-25 



Background and purpose Guidelines for fracture treatment and 
evaluation require a valid classification. Classifications especially 
designed for children are available, but they might lead to reduced 
accuracy, considering the relative infrequency of childhood frac- 
tures in a general orthopedic department. We tested the reliability 
and accuracy of the Muller classification when used for long bone 
fractures in children. 

Methods We included all long bone fractures in children 
aged < 16 years who were treated in 2008 at the surgical ward 
of Stavanger University Hospital. 20 surgeons recorded 232 frac- 
tures. Datasets were generated for intra- and inter-rater analysis, 
as well as a reference dataset for accuracy calculations. We pres- 
ent proportion of agreement (PA) and kappa (K) statistics. 

Results For intra-rater analysis, overall agreement (K) was 
0.75 (95% CI: 0.68-0.81) and PA was 79%. For inter-rater assess- 
ment, K was 0.71 (95% CI: 0.61-0.80) and PA was 77%. Accuracy 
was estimated: K = 0.72 (95% CI: 0.64-0.79) and PA = 76%. 

Interpretation The Muller classification (slightly adjusted 
for pediatric fractures) showed substantial to excellent accuracy 
among general orthopedic surgeons when applied to long bone 
fractures in children. However, separate knowledge about the 
child-specific fracture pattern, the maturity of the bone, and the 
degree of displacement must be considered when the treatment 
and the prognosis of the fractures are evaluated. 



Long bone fractures are the main reason for emergency admis- 
sion of children to orthopedic departments (Deakin et al. 
2007). Fracture classification is essential for comparison of 
epidemiological details and for quality assurance of different 
fracture treatment algorithms. Until recently, multiple classifi- 
cation systems based on anatomical segments or morphologi- 
cal patterns of fracture were used simultaneously to describe 
long bone fractures. The Salter-Harris classification of lesions 



involving the physeal plate and the Gartland classification 
of distal humeral fractures are well-known examples (Gart- 
land 1959, Salter 1963). Some childhood fracture types and 
segments have several available classification systems while 
others have none. 

The Muller comprehensive classification of long bone 
fractures (Muller et al. 1990) (Figure 1) was developed as an 
overall fracture classification system, and has been adapted 
for adult long bone fractures by the Arbeitsgemeinschaft fur 
Osteosynthesefragen (AO) and by the Orthopedic Trauma 
Association (OTA) (Marsh et al. 2007). However, it has not 
been used widely in the classification of pediatric fractures. 
This system does not cover some important aspects of frac- 
tures in children. The pediatric skeleton is softer, is more 
elastic, and includes the non-calcified growth plates and the 
partially calcified epiphysis. Consequently, depending on the 
maturity of the bone and the trauma mechanism involved, the 
bone gives way differently. Very often, at least part of the bone 
is deformed rather than broken apart, resulting in fractures 
with specific patterns in children — including bowing, buckles, 
and green-stick fractures. Moreover, the growth plate is less 
rigid than the surrounding bone, creating a stress riser, and it 
is therefore injured relatively frequently. 

AO introduced a child- specific classification system — the 
AO pediatric comprehensive classification of long bone frac- 
tures (PCCF)— in 2006 (Slongo et al. 2006). Licht und Lachen 
fur kranke Kinder (Li-La) recently introduced an alternative 
classification system, the Li-La classification (Schneidmuller 
et al. 2011). Both systems are based on previous attempts 
to modify the Muller classification of children's fractures 
(Slongo et al. 1995, von Laer et al. 2000). They also incorpo- 
rate well-established classification systems such as the Salter- 
Harris and Gartland classifications. In the last 2 decades, there 
has been major concern about the reliability of most known 
classification systems. Consequently, the PCCF has been stud- 
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Figure 1 . The Muller classification of long bone fractures. 

ied according to a 3 -phase validation concept, as introduced 
by Audige et al. (2005). The results have been promising, at 
least among experts (Slongo et al. 2007a,b). In the third phase, 
which has not yet been performed, the classification should be 
tested in prospective clinical studies to assess its implications 
for treatment options and outcome. 

Starting in January 2004, all inpatient procedures per- 
formed for both adult and pediatric fractures of long bones 
were classified according to a slightly adjusted Muller clas- 
sification (Figure 1) and reported to the Fracture and Dislo- 
cation Registry of Stavanger University Hospital (Meling et 
al. 2009, 2010). Other comprehensive classification systems 
were scarcely established for pediatric fractures at that time. 
We have analyzed the reliability and accuracy of the Muller 
classification as applied to childhood fractures. 



Patients and methods 

Stavanger University Hospital (SUH) serves as the only pri- 
mary emergency care facility in the region. The catchment 
area consists of a mixed urban and rural population of approx- 
imately 317,000 inhabitants, of which 73,000 (23%) are below 
16 years of age. Adult fractures have been considered seper- 
ately elsewhere (Meling et al. 2012). 

All orthopedic surgeons working for the hospital perform 
pediatric operations/reductions irrespective of their other 
orthopedic subspecialty. 242 pediatric long bone fractures 
were reported during the study year (2008). 1 pathological 
fracture (bone cyst) was excluded. 3 patients with synchro- 
nous ipsilateral fractures were excluded. 6 fractures were 
excluded because radiographs were not accessible for re-eval- 
uation. Thus, 232 long bone fractures were considered for re- 
evaluation and were included in the study. 



Group 
3 



20 of the 23 surgeons who contributed to the 
original dataset were still working in the depart- 
ment and were available to participate in the re- 
scoring. Thus, 184 (79%) of the 232 fractures 
were included in the intra-rater analysis (Tables 
1 and 2). 

Intra- and inter-rater reliability and accuracy 
calculations are presented as both percentage of 
agreement and kappa statistics. Intra-rater refers 
to a situation where the same observer, on sepa- 
rate occasions, classifies a fracture. Inter-rater 
refers to a situation in which the same cases are 
rated by different observers. Agreement indi- 
cates how similar the fracture classification data- 
sets are, and it is measured as the percentage of 
even ratings (the proportion of agreement; PA) 
between each dataset. Reliability refers to how 
similar the datasets are relative to the similarity 
expected to occur by chance alone. Reliability 
was measured by kappa statistics (K). Accuracy 
refers to the correctness of the dataset when compared to a 
reference dataset. 

The original fracture codes were reported by the surgeon 
in charge of each operation in the study period. Operation 
notes and perioperative radiographs of the same fractures (but 
not the original code) were presented to the same surgeons 
in a corresponding manner in November 2009. The resulting 
dataset was compared to the original code during the calcu- 
lation of intra-rater agreement and reliability. The fractures 
treated by surgeons who no longer worked at the institution 
(in November 2009) were excluded from parts of the analysis. 
A randomized selection (50% of the fractures that were oper- 
ated by surgeons in 2008) was presented in the same way to 
an average experienced orthopedic resident (with 3 years of 
orthopedic training). The resulting dataset was compared to 
the original dataset to calculate the inter-observer agreement 
and reliability. 

All the original codes were checked (unblinded) and re- 
coded, as deemed necessary by an experienced trauma ortho- 
pedic surgeon. Only the fractures that the first expert re-coded 
were presented to another experienced orthopedic trauma sur- 
geon. Where the experts' preliminary coding did not agree, 
they reviewed the fractures together, making a final consensus 
code. The resulting reference code dataset was compared to 
the other datasets when accuracy calculations were performed. 

We used the first 4 signs of the Muller classification of 
long bone fractures (Muller et al. 1990) (Figure 1). Only 2 
modifications to the classification were required. First, the 
definition of fracture was slightly altered, such that bending 
and incomplete disruptions of the cortices were considered as 
fractures. Secondly, because ossification of the epiphysis is 
age-dependent, the extent of the bone is difficult to evaluate 
from plain radiographs (Figure 2). Consequently, the growth 
plate was considered as the distal/ proximal marking when the 



Acta Orthopaedica 2013; 84 (2): 207-212 



209 



Forearm shaft/ distal forearm? 




■ The rule of Muller (adult) 
The rule of the AO-classification for children 
The rule of Li-La classification for this study 

Figure 2. The rule of the square: "The proximal and distal segments of 
long bones are defined by a square whose sides are the same length 
as the widest part of the epiphysis" (Muller et al. 1990). Muller classifi- 
cation: The width defined by both bones. The reference line defined as 
the most distal (or proximal) part of the bone. Li-La classification (and 
in this study): The width defined by one bone (radius). The reference 
line defined as the epiphyseal plate. 

AO pediatric classification: The width defined by both bones. The ref- 
erence line defined as the epiphyseal plate. The proximal lines of the 
squares define the border between the diaphysis and the metaphysis. 
The fracture illustrated is defined as a forearm shaft fracture accord- 
ing to the Muller and Li-La classifications (and in this study), and as 
a distal forearm fracture according to the AO pediatric classification. 

"rule of square" was used. Like the Li-La classification, and 
in contrast to the PCCF, we did not include pairs of bones, i.e. 
radius/ulna and tibia/fibula, in the square (Slongo et al. 2006, 
Schneidmuller et al. 2011). 

Reliability was measured according to kappa statistics. 
Kappa values range from -1 (no agreement) to 1 (complete 
agreement). A value of 0 indicates no better agreement than 
expected by chance alone. The guidelines of Landis and Koch 
were used when the results were analyzed (K = 0.81-1.00: 
excellent; K = 0.61-0.80 substantial; K = 0.41-0.60 moderate; 
and K = 0.21-0.40: fair) (Landis and Koch 1977). 

Statistics 

The software packages SPSS version 15 and R version 12.2.2 



(http://www.r-project.org) were used for statistical analyses. 
Selection of fractures for inter-rater analysis was performed in 
SPSS by randomization. 

Intra- and inter-rater reliability are presented using Cohen's 
kappa (kappa agreement), which was calculated in R. 12.2.2 
using the package psy and irr (Falissard 2009). 95% CIs were 
estimated according to an adjusted bootstrap percentile CI by 
using a bootstrap CI of Light's kappa (Efron and Tibshirani 
1993, Gwet 2010). 

Ethics 

The Norwegian Social Science Data Service approved the reg- 
istry. The Regional Ethics Committee gave its consent for the 
study on 21 June 2007 (number 152.07). 



Results 

146 of the 184 fractures were given the same classification 
code according to fracture group (4 signs of the classifica- 
tion), giving a PA of 79% and a kappa agreement of 0.75 (CI: 
0.68-0.81). 

In the inter-rater analysis, 108 pairs of fracture classification 
codes were analyzed. The PA was calculated as 77% (83 of 
108) and the kappa agreement as 0.71 (CI: 0.61-0.80) (Tables 
1 and 2). 

196 (84%) of the 232 codes in the original classification 
were accepted as correct by the first expert. The remaining 36 
fractures (16%) were presented to another expert. Of these, 
15 fractures were given the same codes by both experts. The 
remaining 21 fracture codes (9% of the total) were agreed on 
by consensus between the experts. 

201 (87%) of the 232 original classification codes were cor- 
rectly recorded according to the reference code dataset, giving 
a kappa agreement of 0.84 (CI: 0.78-0.88). Furthermore, 140 
of 184 of the surgeons' blinded re-codings (76%) were cor- 
rectly classified (Tables 1 and 2). The kappa agreement was 
calculated as 0.72 (CI: 0.64-0.79). Accuracy for the most fre- 
quent segments according to 3 and 4 signs in the Muller code 
is presented in Table 3. 



Table 1 . Overall agreement, reliability, and accuracy for all signs of the Muller comprehensive classification of long bone fractures in child- 
hood fractures 



Intra-observer reliability Inter-observer reliability Accuracy, unblinded Accuracy, blinded 

(1 84 pairs) (1 08 pairs) (232 pairs) (1 84 pairs) 

AO sign PA PE K PA PE K PA PE K PA PE K 

(%) (%) (95% CI) (%) (%) (95% CI) (%) (%) (95% CI) (%) (%) (95% CI) 

First sign (Bone) 99 52 0.99 100 NaN 1.00 100 NaN 1.00 99 52 0.99 

Two signs (Segment) 91 25 0.88 (0.82-0.93) 94 26 0.91 (0.83-0.96) 94 26 0.92 (0.87-0.95) 91 25 0.88 (0.82-0.93) 

Three signs (Type) 89 23 0.86 (0.79-0.91) 88 24 0.84 (0.75-0.91) 91 25 0.89 (0.83-0.93) 86 23 0.82 (0.76-0.88) 

All signs (Group) 79 16 0.75(0.68-0.81) 77 19 0.71(0.61-0.80) 87 18 0.84(0.78-0.88) 76 16 0.72 (0.64-0.79) 



PA: observed proportion of agreement; PE: the proportion of agreement expected by chance; K: Cohen's kappa agreement. 
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Table 2. Agreement, reliability, and accuracy according to each sign in the Muller classification of long bone childhood fractures. Only the 
codes that were given the same classification code at the previous signs were considered when the next sign was calculated 



Intra-observer Inter-observer Accuracy, unblinded Accuracy, blinded 



AO-code 


n/N 


PA 

% 


K 

(95% CI) 


n/N 


PA 

% 


K 

(95% CI) 


n/N 


PA 

% 


K 

(95% CI) 


n/N 


PA 

% 


K 

(95% CI) 


First sign 


183/184 


100 


0.99 


108/108 


100 


1.00 


232/232 


100 


1.00 


183/184 


100 


0.99 


(Bone) 






(0.93-1 .00) 






(1 .00-1 .00) 






(1 .00-1 .00) 






(0.94-1 .00) 


Second sign 


168/183 


92 


0.86 


101/108 


94 


0.87 


218/232 


94 


0.89 


167/183 


91 


0.86 


(Segment) 






(0.79- 0.92) 






(0.74-0.95) 






(0.83-0.94) 






(0.78-0.92) 


Third sign 


164/168 


98 


0.90 


95/101 


94 


0.78 


212/218 


97 


0.88 


159/167 


95 


0.81 


(Type) 






(0.77-0.97) 






(0.59-0.92) 






(0.78-0.95) 






(0.68-0.91) 


Fourth sign 


146/164 


89 


0.82 


83/95 


87 


0.80 


201/212 


95 


0.92 


140/159 


88 


0.8 


(Group) 






(0.74-0.90) 






(0.67-0.90) 






(0.85-0.95) 






(0.71-0.88) 



n: even coded numbers; N: total of coded fractures; PA: proportion of agreement (proportion of correctness); K, kappa agreement. 



Table 3. Accuracy of the surgeons' blinded re-coding for the most frequent bone segments 
according to 3 and 4 signs of the classification 









Muller type (3 signs) 


Muller group (4 signs) 




Bone segment 




PA 


K 


PA 


K 




"Muller code" 




(%) 


(95% CI) 


(%) 


(95% CI) 




Distal humerus "13" 




22/24 (92) 


0.82 (0.59 to 1 .00) 20/24 (83) 


0.73 (0.49-0.97) 


Forearm shaft "22" 




50/52 (96) 


0.49 (-0.20 to 1 .00) 47/52 (90) 


0.77 (0.57-0.96) 


Distal forearm "23" 




60/72 (83) 


0.00 (-0.51 to 0.51) 48/72 (67) 


0.16 (O.H- 


0.43) 


PA: proportion of agreement (proportion of correctness); K, kappa agreement. 






(Fractures of the other bone segments are not presented because of the small numbers). 




Table 4. Distribution of the fractures according to the reference dataset 




Type/ Group 


A1 


A2 


A3 B1 


B2 B3 C1 C2 


CS 


2 


Proximal humerus 






1 






1 


Humeral shaft 












0 


Distal humerus 


2 


14 


3 7 


2 




28 


Proximal forearm 




2 


2 






4 


Forearm shaft 


1 


11 


51 1 


1 


1 


66 


Distal forearm 




80 


15 






95 


Proximal femur 






1 


2 




3 


Subtrochanteric 




1 


1 






2 


Femoral shaft a 


1 




1 2 






4 


Distal femur 


2 




1 






3 


Proximal tibia 


5 










5 


Tibial shaft 




2 


2 






4 


Distal tibia 


6 




4 


1 




11 


Ankle 


1 


1 


1 


1 1 1 




6 


Total 


18 


111 


73 20 


2 16 0 


1 


232 


a Excluding the subtrochanteric fractures. 



Discussion 

According to the most frequently used guidelines for interpre- 
tation of kappa agreement (Landis and Koch 1977), the intra- 
and inter-rater reliability and accuracy of the Muller classifi- 
cation were excellent when considering three signs of the clas- 



sification and substantial when four 
signs were considered. When each 
sign of the classification was con- 
sidered individually, most kappa 
values were excellent (Table 2). 

There are many pitfalls in per- 
forming a reliability study, espe- 
cially when it comes to the inter- 
pretation of kappa values (Audige 
et al. 2004, Sim and Wright 2005, 
Karanicolas et al. 2009). The inci- 
dence of the different fractures 
varied considerably (Table 4). 
Consequently, our study does not 
permit interpretation of details in 
the subclassification; the result- 
ing CIs were too wide. However, 
interpretation of the general appli- 
cability of the classification should 
be justified, as illustrated by the 
narrow CIs (Tables 1 and 2). 

Determinination of the second 
sign of the Muller classification 
proved to be particularly difficult 
in childhood fractures (Table 2). 
Reviewing details of the surgeons' 
second dataset, 12 of the distal 
forearm fractures were misclas- 
sified as forearm shaft fractures. 
None of the forearm shaft fractures 
were misclassified as distal fore- 
arm fractures. Difficulty in using 
the Muller "rule of the square" may be one reason for this 
problem (Muller et al. 1990) (Figure 2). Another reason might 
be that the surgeons believed that a distal antebrachial fracture 
(both bones) had to be recorded as a diaphyseal fracture. The 
first expert (TM), re-classified the 165 forearm fractures in a 
blind manner (data not shown) using the PCCF's "rule of the 



Acta Orthopaedica 2013; 84 (2): 207-212 



211 



square". The proportion of distal forearm fractures increased 
from 95 of 165 (58%) to 111 of 165 (67%). 

Consideration of the widths of both bones and not only the 
radius when using the rule of the square improved the accu- 
racy of classifying the fracture into epiphyseal (E), metaphy- 
seal (M), or diaphyseal (D) from a kappa value of 0.78 to one 
of 0.98 (Audige 2004). The latter finding has not been repro- 
duced among less experienced surgeons, whose results — split 
into kappa values for E, M, and D — were 0.66, 0.80, and 0.91, 
respectively (Slongo et al. 2007a). The corresponding articu- 
lar/non-articular classification of the Li-La classification was 
performed at an overall kappa value of 0.88 (Schneidmuller 
et al. 2011). Validation has also been evaluated according to 
the child- specific patterns. The settings of the child- specific 
patterns among PCCF experts were 0.92, 0.91, and 0.84 for E, 
M, and D, respectively (Audige 2004). However, for surgeons 
with average experience the corresponding kappa values were 
0.51, 0.63, and 0.48, respectively (Slongo et al. 2006). For the 
Li-La classification, the overall kappa for the specific child 
fracture code was 0.72 (Schneidmuller et al. 2011). These 
results are not easily compared to those in our study. However, 
generally speaking, the kappa values listed in Tables 1 and 2 
appear to exceed those in the latter studies (Slongo et al. 2006, 
Schneidmuller et al. 2011). 

To determine the treatment and the prognosis of a fracture, it 
is necessary to know how stable the fracture is and the possi- 
ble spontaneous correction of the displacement. This matter is 
often not entirely considered in classification systems because 
it may lead to poor reliability (Kreder et al. 1996). The Miiller 
classification, for instance, does not generally consider the 
level of displacement of the fracture fragments. The level of 
displacement is only partially considered in the PCCF (for 
supracondylar fractures of the humerus and proximal fractures 
of the radius), and the level of displacement and the maturity of 
the bone are generally considered in the Li-La (non-displaced/ 
tolerably displaced and non-tolerably displaced). In a registry 
setting, age and sex are recorded, thus the maturity of the bone 
might be considered — although the Miiller classification does 
not include this consideration. Child-specific fracture patterns 
such as buckle and green- stick fractures and different injuries 
to the growth zone reflect the stability and outcome of the 
fracture. The importance of classifying child- specific fracture 
patterns for treatment and outcome remains to be proven, as 
stated in step 3 in the 3 -phase validation concept of Audige 
et al. (Audige et al. 2005). Although the PCCF was presented 
in 2006 (Slongo et al. 2006) and the Li-La classification in 
2011 (Schneidmuller et al. 2011), the Miiller classification 
is still used in the Fracture and Dislocation Registry at our 
hospital for both pediatric and adult fractures (Miiller et al. 
1990). However, we consider to also register the child- specific 
fracture pattern, which would result in a registration close to 
what has already been proposed by Slongo et al. (Slongo et al. 
1995). The relatively few childhood fractures treated by each 
general orthopedic surgeon and the disadvantage of presenting 



2 separate classification systems to the surgeons reporting to 
the Fracture and Dislocation Registry at our hospital makes 
the introduction of an additional child-specific classification 
system less appropriate (Meling et al. 2012). 

In summary, reliable classification of pediatric long bone 
fractures is possible to perform, at group level (4 signs), 
according to a slightly adjusted Miiller classification. How- 
ever, the classification does not cover some important con- 
siderations needed for treatment and prognostic evaluation. 
Consequently, at least age and gender of the patient and child- 
specific pattern of the fracture should also be reported. 
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